Corpora for computational linguistics 1 Running head : CORPORA FOR COMPUTATIONAL LINGUISTICS Corpora for computational linguistics

نویسندگان

  • Constantin Orăsan
  • Le An Ha
  • Richard Evans
  • Laura Hasler
  • Ruslan Mitkov
چکیده

Since the mid 90s corpora has become very important for computational linguistics. This paper offers a survey of how they are currently used in different fields of the discipline, with particular emphasis on anaphora and coreference resolution, automatic summarisation and term extraction. Their influence on other fields is also briefly discussed. Corpora for computational linguistics 3 Corpora for computational linguistics

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Electronic Corpora: As Powerful Tools in Computational Linguistic Analyses

Technology has emerged almost all the domains in our daily life. In computational linguistics, the uses of electronic corpora are very important. Nowadays it is possible to study linguistic phenomena by using statistical analyses: Concordances, collocations and frequencies have great influence in making linguistic researches more available, more adequate and more accurate.

متن کامل

Parallel Corpora for the Galician Language: Building and Processing of the CLUVI (Linguistic Corpus of the University of Vigo)

In this paper, we present the methodology developed by the SLI (Computational Linguistics Group of the University of Vigo) for the building and processing of the CLUVI Corpus, showing the TMX-based XML specification designed to encode both morphosyntactic features and translation alignments in parallel corpora, and the solutions adopted for making the CLUVI parallel corpora freely available ove...

متن کامل

Towards an integrated representation of multiple layers of linguistic annotation in multilingual corpora

There has been an increasing interest in recent years in the enrichment of natural language corpora in terms of annotation with explicit linguistic information. This interest manifests itself most prominently in two areas of linguistics: corpus linguistics and computational linguistics. For corpus linguistics, the long standing practice has been to work on raw, i.e., unannotated text. While raw...

متن کامل

Multi-level annotation of linguistic data with MMAX2

This paper describes how richly annotated corpora can be created with the annotation tool MMAX2. The description is from the point of view of Computational Linguistics, a discipline where annotated corpora are often used as resources for software development. The paper outlines the important steps in the life cycle of an annotation and details how the tool MMAX2 can be employed in each of them.

متن کامل

Shared Corpora Working Group Report

We seek to identify a limited amount of representative corpora, suitable for annotation by the computational linguistics annotation community. Our hope is that a wide variety of annotation will be undertaken on the same corpora, which would facilitate: (1) the comparison of annotation schemes; (2) the merging of information represented by various annotation schemes; (3) the emergence of NLP sys...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006